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ABSTRACT 

This paper reviews the literature about item response 
models for the subject level and aggregated level (group level) . 
Group-level item response models (IRMs) are used in the United States 
in large-scale assessment programs such as the National Assessment of 
Educational Progress and the California Assessment Program. In the 
Netherlands, these models are useful to the National Institute for 
Educational Measurement, especially for the Dutch National Assessment 
Progretm of Educational Achievement. Ai ,er a short introduction on 
IRMs on the subject level, a comprehensive treatment is given of the 
following estimation methods for subject-level parcuneters: joint 
maximum likelihood, conditional maximum likelihood, marginal maximum 
likelihood, logit based parameter estimation, the Bayesian approach, 
and other estimation procedures. A group-level IRM describes the 
probability of a correct response from an examinee selected at random 
from a specific group. The following group-level models are 
described: the group fixed-effects model, the two-parameter and 
three-parameter normal-normal model, the normal-logistic model, and 
the California Assessment Program model. Analogies and differences 
between group-level and subject- level IRMs are discussed. Group- level 
IRMs may be justified as aggregate de' -riptions of IRMs on 
subject-level, a'^d they may be interpreted analogously. Group-level 
IRMs are implied by subject-level IRMs only when within-group ability 
distributions are identical except for location. For the 
subject-level, the addition of an examinee increases the number of 
incidental (ability) parameter t; however, for the group-level, the 
number of ability parameters does not increase. (RLC) 
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Abstract 

This paper contains a review of the literature about item 
response models for the subject and aggregated-levex (group- 
level) . 

After a short introduction on item response models on 
subject -level r a comprehensive treatment is given of the 
following estimation methods for subject-level parameters; 
joint maximum likelihood, conditional maximum likelihood, 
marginal maximum likelihood/ logit based parameter 
estimation, the Bayesian approach, and some less familiar 
procedures • 

A group-level item response model describes the 
probability of a correct response from an examinee selected 
at random from a specific group. The following group-level 
models are described; the group fixed-effects model, the two- 
and three-parameter normal-normal model, the normal- logistic 
model and the Calif ornian Assessment Program (CAP) model. 

Finally, the analogies and differences between group- 
level and fubject-level item response models are discussed. 
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Item Response Theory at Subject- and Group-Level 
Introduction 

Item response theory (IRT) can be seen as a reaction to the 
well-documented shortcomings of classical test theory 
(Fischer, 1974; Hambleton & Swaminathan^ 1985). 

An item response model specifies a relationship between 
the observable item performance of an examinee and the latent 
trait or ability assumed to underlie the performance on that 
item. Item characteristic curve (ICC) is a central construct 
in item response theory. Generally/ an ICC is a monotonically 
increasing mathematical function ranging from zero to one 
that gives the probability of an examinee with a given 
ability levrel answering thft item correctly. In the one- 
parameter model, also called the Rasch model, sufficient 
statistics are available: the relative item difficulty can be 
estimated independent of the sample of examinees used, and 
estimators of the relative examinee ability are independent 
of the particular subset of items from a certain item domain. 
This feature makes item response models particularly useful 
in comparative studies, where performance of (groups of) 
examinees are compared. 

There has been an increasing interest among assessment 
and evaluation researchers for models to analyse data at an 
aggregated level. This interest has initiated the formulation 
of item response models for groups of subjects, such as 
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schools or sex (Bock, Mislevy & Woodson, 1982) . These group- 
level item response models are used in the United States of 
America in large scale assessment programs like the National 
Assessment of Educational Progress (NAEP) and the Californian 
Assessment Program (CAP) • In the Netherlands these models are 
useful to the National institute for Educational Measurement 
(CITO) , especially for the Dutch National Assessment Program 
of Educational Achievement (PPON) • 

Item response models at subject-level 

As mentioned in the introduction, the development of item 
response theory started with models foirmulated at the level 
of an individual subject. In this paragraph these item 
response models and their estimation procedures will be 
discussed. 

The probability of a correct response X^i=l from an 
examinee v selected at random from a certain population to 
item i, can be written as a function of the examinees ability 
6v and & vector of item parameters Ij^: 

Pvi » P(Xvi-l) « HiOv/li) 

where Hi(0,,,li) is a continuously differentiable function of 
Oy. Usually, (0^,1 i) is either the normal-ogive or the 
logistic curve. For the two-parameter normal-ogive model 
(Lord & Novick, 1968) the probability that an examinee v with 
ability level 8^ passes item i is given by 

ERIC 8 
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P^i = P(Xvi=l) = 



■I 



where is the item difficulty, is the discrimination 
parameter and 0(t) is the normal density function. The 
normal'ogive has a point of inflexion at 9=5^; at this point 
the probability of a correct answer is 0.5, and the slope of 
the curve is (2K)'''^^^a^. 

For the logistic function model the probability is: 

Pvi = P(Xvi=l) = [l+exp{-D a^Oy - bi)}]-!, 

D is an arbitrary constant. When D=1.7, the normal-ogive and 
the logistic item response functions are almost equal. The 
logistic model is often pn Terred because of its mathematical 
convenience. 

The two-par^meter models can be modified to take 
guessing into account. If c^ denotes the guessing parameter, 
i.e. the lower asymptote, the three-parameter logistic model 
becomes 



Pvi = Ci + (1- Ci) [l+exp{-D aiCSv - bi)}]""! 

Much attention has been paid to the one-parameter logistic 
model, also called the Rasch model. In this model all items 
have the same discriminating power, i.e. a^ is a constant for 
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all items in the test. The ICC's only differ in their 
location, indicated by the item difficulty parameter b^. The 
Rasch model is given by 

Pvi = [expOv - bi)]/[l + exp(e^ - bi) ] 

The advantage of the Rasch model is that the total test score 
is a sufficient statistic for the examinee's ability 
(Fischer, 1974) . 

Estimation 

In this paragraph a short review of the available estimation 
procedures for the item response theory models will be given. 
Some of the advantages of these procedures will also be 
discussed. First the most often used procedure, the joint 
maximum likelihood estimation, will be described. 

Joint Maximum Likelihood Estimation (JML) 

Let the (Nxn) matrix U contain the responses of N examinees 
on n items, in such a way that 

where Xv is a column vector which contains the responses x^^ 
of examinee v to all n items. Under local independence the 
likelihood function is 
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L(u!9i,e2,..,0N.Iirl2^--'In) - n L(x^l8v) = 

v=l 

N n x^i 

n n (i-Pvi) # 

v^l i=l 

where is a vector containing the item parameters of item 
i. To calculate the maximum IDcelihood estimates of 
fl«(0i,e2f . . .fO^) and li (for i»l,..,n), the following 
likelihood equations have to be solved 

5 In L/8mj^ « 0, 

where mj^ is the k-th element of the vector 
fll=[flfli#l2' • • • '^n^ • three parametermodel m 

contains N+3n elements and because of the indeterminacy in 
the models N+3n-2 parameters have to be estimated. 

There are some proble!.s limiting the use of JML; see 
Harcbleton & Swaminathan (1985, p. 135) for a discussion of 
these. The main problems are that soJving a system of so many 
nonlinear equations takes a lot of computing time and that 
the parameter estimates may take on values outside the 
accepted range. A more fundamental problem with JML 
estimation is that the item parameters are not estimated 
consistently. When simultaneous estimation of item and 
ability parameters is attempted, the number of ability 
parameters increases with the addition of each examinee. 
Therefore the estimators of the (structural) item parameters 
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will not always converge to their true value since th3 number 
of (incidental) ability parameters increases too. 

Conditional Maximum Likelihood Eaf_ir..AMQn (CML) 
Conditional Maximum Li);3lihood estimation is based on the 
availability of a sufficient statistic for the ability 
parameters. In the Rasch model the total test score is a 
sufficient statistic for the ability parameter. Since the 
Rasch model is a member of the exponential family (Fischer, 
1974) the conditional probability of observing the response 
vector does not depend on the ability parameter Oy. 

P(Xv=Xv iTv^ty) = P(Xv=iv'Tv=tv)/P(Tv-ty) = 

,1 n [exp(-bJ^M 

{xlZxi^ty} i 

where ^} is the set of all possible response patterns 

;i»[xx^X2f • wX^] with total sum score ty. It can easily be 
seen that examinees with all items wrong or all items correct 
have to be eliminated from the sample since in that case 
there is only one (x 

The above obtained estimators of the item parameters are 
consistent and have an asymptotic normal distribution 
(Andersen, 1970) . 

After the estimation of the item parameters, i-he ability 
parameters are commonly estimated by substitution of the item 
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parameter estimates in the Rasch model. The precise effect of 
using the estimated parameters instead of the parameter 
values is not known (Engelen, 1989) . 

Marginal Maximum Likelihood Eatimat^ion (MbfL) 
In the marginal approach, it is assumed that there exists an 
ability distribution F and that the ability of a randomly 
selected examinee is a realization of this distribution F. 
Tho probability of observing any response pattern given 
the population, can be evaluated by integrating over the 
population density. So 

P(X-ilF,I) - fp(X«xlfl,I) dF(e) ■ X 

0 

where I«[Ii,l2' • ''Inl • Note that X and fl are vectors with 
random variables now. 

There are 2^^ response patterns and if N is the number 
of examinees with response pattern ji, then the ioglikelihood 
. 4Ction is given by (Hambleton & Swaminathan, 1985) 

2^ 

ln(L) « N Z In + constant. 
^ x«l * 

There is no ability parameter anymore in this likelihood 
function, so the maximum likelihood estimators are obtained 
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by differentiating ln(L) with respect to the ite*^ parameters, 
setting them equal to zero and solving the equations. 

Numerical issues can be a problem in solving these 
equations. Another problem is the ability distribution F. 
However, Engelen (1987a) showed for the Rasch model that one 
can estimate the ability function jointly with the item 
parameters, without making any assumption on the ability 
distribution. 

The advantages of MML over CML estimation are that no 
examinees have to be eliminated from the data and that it is 
also applicable to the tWL- or three-parameter logistic 
model. A disadvantage of MML estimation is that no estimators 
of the individual ability parameters are available, but only 
information about the distribution o* the ability is 
obtained. 

Loyiti-hafiAd parameter estimation 

An important reason for investigating the possibilities of 
logit-based parameter estimation is the expected low computer 
costs of the procedures. Logit -based parameter estimation has 
been explored by Verhelst and Molenaar (1988) for the Rasch 
model and by Baker (1987) for the two-parameter logistic 
model . 

Verhelst and Molenaar (1988) transform an initial in- 
consistent estimator into a asymptotically efficient one. Let 
L)g(6) be the log-li)celihood function of parameter 6 and let 

An (8) = 5LN(e)/8e, 
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Assume that A^(6) is asymptotically normal distributed 
N(0,lg), with Ig the Fisher-information matrix per single 
observation. If 6jj<^) (the starting value) is any Vn- 
consistent estimator then 

B^n) . eN<0) + [I (o)pl A^iB^lO)) 

is asymptotically normal with 
[Vn(eN<^>-e)] -►NCO.Iq-I). 

Since all persons with the same raw score will end up 
with the same O-estimate, they can be treated as having the 
same ability value. This notion is used by Verhelst and 
Molenaar (1988) and by Baker (3'^87) to introduce least- 
squares logit estimation. 

In the case of the Rasch model the logit model is 

logit Pi Is = 8s - bi, 

in which Pi|s denotes the probability of a person with score 
s answering item i correctly and 63 the ability of persons 
with s item answers correctly. Verhelst and Molenaar (1988) 
note that this model is not the same as the Rasch model, 
because in regression models the observed variables are 
functionally -independent of the dependent variable while in 
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the Rasch model they are completely dependent. Verhelst and 
Molenaar (p. 288-292; compared weighted-least squares (WLS) 
estimators with CML estimators in some data settings. The WLS 
ability estimates sometimes failed to increase with total 
score. However^ for simulated (perfect) data the WLS 
estimates multiplied by a constant came very close to the CML 
estimates . 

Baker (1987), organized the data for each litem in a sx2 
contingency table. Here s denotes the number of ability 
groups with midpoii.c 6j, containing fj examinees. For the 
two-parameter logistic model the logit (Pij) is given by 

logit (Pij) « ai(bi-ej) . 

Baker used a two stage iterative procedure for the joint 
estimation of iter and ability parameters. In the first stage 
tha ability parameters are substituted by their estimates and 

x2 = I fjPi^(l-pij){log[Pij/(l-Pij)]-[ai(bi-ej)]}2 
j=l 

is minimized to estimate the item parameters. In the second 
I stage 

x2 = I fjPij('-Pij){log[Pij/(l-pij)]-[ai(bi-ej)]}2 
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is minimized to estimate 8j for each ability group 
separately, while the item parameters are substituted by 
their estimates. Stage 1 and 2 are repeated until a 
convergence criterion is reached. Baker performed a 
simulation study in which the results improved as test length 
and sample size increased as well as when the test difficulty 
and the group ability were matched. Surprisingly, although 
the item parameters were somecimes poorly estimated, the 
ability estimates correlated high with the underlying ability 
parameters. 

In conclusion, though logit based parameter estimation 
in item response theory is less expensive than ML estimation, 
the precision of the estimates is also less. 

payesian approach 

In the Bayesian approach prior distribution^a are imposed on 
the parameters of interest. Then, after the data is obtained 
Bayes' theorem is used to compute the posterior distribution. 

Bayesian estimation starts with the specification of a 
certain parametric prior distribution or with the 
specification of empirical priors estimated from the data. 
Hierarchical Bayesian estimation arises if a distribution is 
specified for the parameters in the prior distribution. 

The hierarchical Bayesian estimation procedure will be 
discussed in further detail because of its flexibility. 
However, the objection againsc Bayes' procedures that no 
empirical evidence for the choice of the priors is given 
still applies to some extent. Here hierarchical Bayesian 
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estimation will be illustrated considering the three 
parameter model (as in Hambleton & Swaminathan, 1965) 

Pvi = Ci + (1- Ci){l+exp[-D a^ie^ - bi)]}"!. 

Let f (6^) be the prior believe about the ability of examinee 
V (v«l/2r..fN) and let fia^)/ t{b^) and fic^) be the prior 
believes about the parameters of item i (i=l, 2, • . ,n) • The 
joint posterior density of the parameters and q is 

- n N 

L(xlfl,a,i2,£) n f (ai)f (bi)f (Ci) n fie^). 

i»l v«l 

It is necessary to take into account the restrictions of the 
parameter considered when specifying the prior. For example, 
since ai is generally positive, an appropriate prior for ai 
would be the chi-square distribution. The next stage is to 
specify the distributions of the parameters of the prior 
distribution. Once these distributions are specified, the 
values of the parameters and £ that maximize the joint 

posterior distribution can be obtained. 

The hierarchical Bayes' procedure yields good results, 
even in cases where maximum likelihood estimation performs 
rather badly (Hambleton & Swaminathan, 1965; Engelen, 1967b). 
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Other estimation procedures 

Under the assumption that the two-parameter model fits the 
data and the ability is N(0,1) distributed, one may consider 
the procedure described by Lord & Novick (1968, ch. 16.10) 
using point biserial correlation coefficients. 

For the Rasch model other procedures are available. 
Minimum chi-square estimation is such a procedure, proposed 
by Fischer (1974). Let N^j denote the number of examinees 
that answers item i correct and item j wrong and let Nj^ be 
the number of examinees that ansers item j correct but item i 
wrong, if the Rasch model fits the data 

Nij/Nji = exp(-bi) /exp(-bj) » exp(bj-bi). 

Let 8i»exp(bi), then 

^L^(nij8i - njiSj) /bibj (n^j+nji) 

is the quantity to be minimized. 

The Rasch model rewritten as a model for paired 
comparisons with ties, resulted in estimation by paired 
comparison. Here the responses of an examinee v to a pair of 
items are compared. These response patterns give information 
about the relative difficulty of the two items for examinee 
V. For more details, see Engelen (1987b) . 
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Item response theory for aggregated data 
Introduction 

According to Mislevy and Reiser (1983), there are two 
dimensions along which an application of IRT to large- 
assessment settings can vary: (1) the level at which an item 
response model is defined and (2) the level at which ability 
estimates are produced. The marginal maximum likelihood 
estimation procedure maintains the subject-level definition 
of an item response model, but just gives information about 
the ability distribution in the sample. In this chapter the 
focus will be on the group-level definition of item response 
models and their relationship to the more familiar subject- 
level models. 

In contrast to item response models for the subject- 
level, a group-level item response model does not describe 
the probability of a response to an item from a specific 
examinee, but describes the probability of a response from an 
examinee selected at random from a specific group. By groups 
are meant salient groups, segments of a population 
(subpopulations) that can easily be identified such as sex, 
race, social economical class and urbanity. Salient groups 
make it possible to decide on curriculum issues concerning 
certain subpopulations. Furthermore, the items are classified 
in narrowly defined skill domains. 
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Item response models for groups 

The probability of a correct response x^g^ to item i by an 
examinee v, selected at random from a subpopulation g can be 
written as a function of Bg, the "ability" level of that 
subpopulation and the item parameters ij^: 

Pgj = P(Xvgi«l) = HitOg.Ii). 

Hi(6g,li) is a (with respect to 6g) continuously differential 
and generally monotonically increasing function ranging from 
0 to 1. Furthermore, Ngi is the frequency of attempts to 
answer item i by members from group g, out of which Rg^ were 
correct responses. The probability of observing the vector 
Bg* (Rgi / Rg2' * * ' ^gn^ correct responses among 

(%1'%2' * ^ '^gn) attempts can be written as 

n ( Pai <l-Pal)Ngi-Rgi. 

i=l Rgi 

It is assumed that the responses of different examinees given 
the attainment level of the subpopulation g, are independent. 

The following part is heavily based on Mislevy (1983), 
who shows under what conditions group-level item response 
models with Hj^ (Og,!^) are implied by subject-level item 
response models with Hi<6vg'Ii)« 

Let Hi(9vg'Ii) ^® subject-level item response curve 
of item i. Let Ej^ be a continuous random nuisance variable 
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with mean zero and density function f^. The value of the 
response of a randomly selected member v of group g, Xyg^, is 
assumed to depend on the fixed item threshold value p^ end 
the person's ability Oyg. The possible values of this 
response are defined as follows 

Xvgi=l if Oyg + Ej^ > Or equlvalently 

if hi ■ Ei + <evg - Og) > Pi-Og 

' Xvgi=0 Otherwise. 

Let di be the density function of hi. The probability of 
a correct answer to xtem i by a random member v from group g 
is then given as 

oo 

p(Xvgi=ileg, Pi) «| di(h)dh ■ Hi(eg-Pi) =Hi(eg,ii), 
-Pi-eg 

where li, again, is the vector containing the item parameter 
of item i. 

Since ability only appears in the form of the mean group 
ability, it is assumed that all populations have the same 
ability distribution except for location. This assumption of 
homoscedasticity is a strong one and needs to be tested. 

To test the assumption of homoscedasticity, the item 
parameters of the subject-level item response model need to 
be known or estimated. This means that at least two responses 
have to be elicited from each examinee. All the within-group 
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ability distributions should belong to the same known 
perametric distribution. The group ability parameters too, 
should follow a known parametric distribution. For more 
details about procedures and tests s j Mislevy (1984) . 

So, recapitulating, {Og,li) is a group-level item 
response curve for item i under the assumption of the 
subject-level item response curve (Ovg^^i) equal 
ability distributions within groups except for location. 

Except in some special cases no simple closed form 
expression may exist for dj^ and H^iOg,!^). These exceptions 
are: (1) the group fixed effects model, (2) the two and three 
parameter normal-normal model, (3) normal logistic models and 
(4) the CAP model. These models will be discussed in the 
following paragraphs. 

The yroup f ixed-ef fect^s model 

Reiser (1980, 1983) suggests the group fixed effects model, 
where it is assumed that grouping accounts for all variation 
among examinees. So, O^g * 6g for v=l,2,..,N and g»l,2,..,m. 
Because it is assumed that each examinee responds to only one 
Item in the item domain, variability at the subject-level is 
considered as independent within- group error. The model is 
formulated as a logit model where 



Zgi« log 



P(Xvgi-l) 
P(Xvgi-O) 



« log 



P(Xvgi=l) 
l-P(X^gi«l) 



and 



23 



Subject- and Group IRT 

19 

Sgi* bi + kg' fl a^. 

Here bj^ and aj^ are the item parameters, kg' is a Ixm row 
vector from a designmatrix K and ft is a mxl vector of 
contrasts to be estimated among the sampled groups. The 
product kg'fl specifies a weighted combination of effects from 
ft to produce the relative scale position of group g 
(g«l,2/ . .,m) . 

The log likelihood for the given data is 



m n 



log L = ^Z^{Rgilog P (X^gi^l Ifl, ai^bi) + 



<Ngi " ^0 log [l-P(Xvgi»llfl,ai,bi)]} + const., 

where Rgj^ is the number of correct responses in group g on 
item i and Ng^ is the number of examinees in group g who 
respond to item i. Parameters are estimated by an iterative 
procedure using Fisher's efficient scoring method, i.e.: 





t + 1 




h 




h 




3 











+ {it(li,a,e) }-i 



51 /fin 
5i/5a 
5l/5fl 



where t is the iteration step, if I^lti,&,&) is not of full 
rank, the method does not converge. Asymptotic standard 
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errors of the estimators are available as functions of the 
diagonal elements of [It (l^rj&rfi) l'^* 

Goodness of fit of the model can be assessed using the 
Pearson's chi-square or the likelihood ratio statistic. 

Two- and three-parameter normal^normal model 
Normal-normal indicates that both the subject-level itr/m 
response density function and the subpopuJ.ation ability 
density function gg are normal, in which case the group-level 
item response density functions are normal as well (Mislevy, 
1983) . 

Let Ci be the guessing parameter, the item threshold 
and the standard deviation of item i in a sut ject-level 
normal-ogive three parameter model. The probability of 
observing a correct response to item i by an examinee with 
ability O^g is given as 

P(Xvgi) « Ci + (1-Ci) «[<evg " Pi)/Oi] 

Within the groups, Bvg is normally distributed with mean Bg 
and variance CJg^. The probability of observing a correct 
answer of a randomly selected person from group g is then 
equal to 
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p(Xgi-i leg,ci,Pi,Oi) - J Pt(e) gi(e)de 

= Ci + (1-Ci) 0[(eg-Pi)/V(0i2 + Og2)] 

Results for the two-paraumeter normal ogive model at group- 
level follow as a special case of the three-parameter model 
in which c^-O. 

Normal-logistic model 

Mislevy <1983) shows how homoscedastic normal groups and a 
subject-level two- or three-parameter normal ogive item 
response model imply the existence of a corresponding group- 
level item response model. There is no similar result for 
logistic item response models, because the convolution of a 
logistic density with another logistic or normal distribution 
does not result in either a logistic or a normal density. 
There is, however, a possibility of approximating the 
logistic density with a normal one by 0(2)«iV<1.7 2). In that 
case a logistic subject-level item response model is assumed 
to fit with item parameters Pi/O^ and c^. This subject-level 
item response model is approximated by a normal subject-level 
item response model with item parameters P^, 1.70^ and c^. if 
ability is assumed to be normally distributed in the 
subpopulations, then the procedure in the previous paragraph 
can be followed, resulting in a approximate group-level item 
response model. 
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Californian Aasessrrtent Proyram model 

Finally, the basic model in the Californian Assessment 
Program <Mislevy & Bock, 1984) • This model is formulated at 
the level of detail necessary for diagnosing curricular 
effects: school level and skill element. Again the design 
permits every examinee to answer only one item on each skill 
domain. The probability of a random examinee v from school g 
answering item i coj^rectly is equal to 



P(Xvgi-l) 



exp[(eg-Pi)/ai) 
1 + exp[(eg-Pi)/ai] 



V[(eg-Pi)/ail 



Here 6g is the average ability level of examinees in school g 
for the skill element of interest. Item parameters P^ and Oj^ 
are the item threshold and dispersion, respectivily . The 
probability of a school pattern of numbers correct attemps 
Bg"[Rgi,Rgi, . .fRgnJ f given the total numbers of attemps 

Iig«[NgirNg2r . .,Ngn] is 



P<figlHgr8grfirff) « H < ^ ) P^i (i-p .)Ngi-Rgi, 

i^l Rgi ^ ^ 



This equation is t'ssential in the parameter estimation 
procedure, if this equation is employed in a de jn, vv% -ein 
an examinee might sometimes respond to more than one item, 
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the school and item estimates are consistent but the 
resulting standard error of estimation would tend to be a 
little too small (Mislevy & Bock, 1984, p. 7) . 

The estimation procedure needs the assumption that the 
distribution of school scores in the sample is approximately 
normal, but it need not be assumed that the distribution is 
approximately normal in the population itself. Furthermore, 
the estimation procedure is based on the assumption that the 
model holds and uses the marginal maximum likelihood 
approach. After the calibration of the items a goodn^ss-of- 
fit test is applied to evaluate this assumption. 



The relation between group-level and 
subject-level item response models 

Group-level item response models may be justified as 
aggregate description? of item response models on subject- 
level and interpreted analogouslv. Group-level item response 
models are implied by subject-level item response models only 
when within-grov^ ability distributions are identical except 
for location (Mislevy, 1983) . 

In the context associated with the previous described 
models, every examinee answers only one item of each skill 
domain; hence individual ability levels can not be estimated. 
Even if some distinguished skill domains can be considered as 
one latent trait, there are still too few observations of 
each examinee, and ability estimates will have a considerable 
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measurement error. A more complex design with each examinee 
taking a few items per skill domain would provide more 
reliable estimates. In this design only one observation of 
each examinee will be used to estimate the group ability 
parameter. 

Both at subject and group level, parameters are 
undetermined in their scale and in order to eliminate these 
indeterminacies some parameters (the number depends on the 
item response model) could be fixed aioitrarily. However, 
there is an important difference too. For the subject-level 
the addition of an examinee increases the number of 
*ncidental (eODility) parameters. For the group-level, 
however, the number of ability parameters does not increase. 

If a test indicates that the homoscedasticity assumption 
is not realistic, the detection of aberrant response patterns 
will become very interesting. But if only one observation is 
available, procedures used on subject-level as described by 
Kogut (1987a, 1987b, 1988) are not applicable. 

So future research should try and find closed form 
expressions for a group-level item response model with less 
severe restrictions on the ability distribution within 
groups. Homoscedasticity tests and methods for detecting 
aberrant response patterns should be refined and adapted. 



ERLC 



Subject- and Group IRT 

25 



References 

Andersen, E.B. (1970). Asymptotic properties of conditional 
maximum li';elihood estimates. The Journal of the Royal 
Sf-atiatical Joclety, Series B. 2Z9 283-301. 

Baker, F.B. (1987). Item parameter estimation via minimum 
logit chi-square. British Journal of Mathematical and 
Statistical Pivchology^ 1ft, 50-60. 

Boc)c,R.D.; Mislevy,R. & Woodson, C. (1982). The next stage in 
educational assessment. Educatio nal Researcher . H ,(3), 
4-11. 

Engelen, R.J.H. (1987a). Semlparametric estination in l-he 

Rasch model (research report 87-1) . Enschede: 

Universiteit Twe^te. 
Engelen, R.J.H. (1987b) A review of different estimation 

procedures in the Rasch model (research report 87-6) 

Enschede: Universiteit Twente. 
Engelen, R.u.H. (1989) Parameter estimation in the logistic 

item response model , (thesis) Enschede: University of 

Twente . 

Fischer, G. (1974) Einfiihrung in die Theorie psvchologischer 
XfifilL, Bern/Stuttgart /Wi en: Verlag Hans Huber. 

Hambleton, R.K & Swaminathan, H. (1985) Item response theory! 
principles and applications . Boston/Dordrecht /Lancaster : 
Kluwer-Nijhoff Publishing. 

Kogut,J (1987a). Detecting aberrant response pattern s in the 
Rasch model (rapport 37-3) • Enschede: Universiteit 
Twente, Faculteit dor Toegepaste Onderwi jskunde. 



ERIC 



3Q 



Subject- and Group IRT 

26 



KogutfJ (1987b). Reduction of bias in Raach eftt-imat-PS due to 
aberrant patterns (rapport 87-5). Enschede: Universiteit 
Twente, Faculteit der Toegepaste Onderwi jskunde. 

Kogut, J (1988) . Asymptotic diatribnt-lnn of an IRT person fit 
index (Research Report 88-13). Enschede: Universiteit 
Twente, Faculteit der Toegepaste Onderwi jskunde. 

Lord, F.M. & Novick, M.R. (1968). Statistic theori<^s of 
mental teAt scores . Reading (Mass . ) /London etc . : Addison- 
Wesley Publishing Company, Inc. 

Mislevy,R.J. (1983). Item response models for grouped data. 
Journal of Educational Statistics . 8. 271-288. 

Mislevy,R.J. (1984). Estimating latent distributions. 
Psvchometrika. 359-381. 

Mislevy^R. J. & Boc)c,R.D. (1984) . A technical descrtpt-lon of 
the procedures used in calculating school-level scaled 

scores for the "Survey of basic alUJJL^L; flrade 6". 

California State Department of Education, Sacramento, 
CA. 

Mislevy,R.J. & Reiser, M.R. (April 1983). It em response 
methods for educational assessment, paper presented at 
the annual meeting of the American Educational Research 
As s oc i at i on , Mont rea 1 , Quebe c . 

Reiser, M (1980) . A latent trait model for group effects 

(thesis: department of behavioural sciences, committee 
on methodology of behavioural research) . The University 
of Chicago, Illinois. 



^rIc 31 



Subject- and Group IRT 

27 



Relser,M. (1983). An item response model for the estimation 

of demographic effects. Journal of Educational 

StAi-ifitiics.8, 165-186. 
Verhelst,N. & Molenaar, I .w. (1988). Logit based parameter 

estimation in the Rasch model. St at i <it. i ra Neer landica . 

12, 273-295. 



ERIC 



RR-90-2 
RR-90-1 

RR-89-6 

RR-89-5 

RR-e9*4 

RR-89-3 

RR-89-2 

RR-89-1 

RR'-88*18 

RR-88-17 

RR-88-16 

RR-88-15 

RR-88*14 
RR'-88-13 
RR*88*]2 

RR-88-11 

RR-88-10 



tlea of racant Ri>«<i«rch Raoorta from tha Dlvlalon of 
Ed^ffJi^•^ftnMl Maaaurmant and Data Analyala. 
Univariiltv of Twanta, En^ehada, 
Tha Nath^rlanda, 

H. Tohi, Itom Rmsponsm Thmory at subject- and group-iavei 

P. Westara & H. Kaldaiman, Differantiai item functioning in 

jnuitirla choice items 

J.J. Adama, lapXementations of the Bianch-and^Bound method 
for teat construction problems 

H.J. V05, A simultaneous approach to optimizing treatment 
assignments with mastery scores 

M.P.F. Barger, On the efficiency of IRT models when applied 
to different san •>ling designs 

D. L. Knol, Stepwise item selection procedures for Rasch 
scales using quasi^loglinear models 

E. Boakkooi-Tinminga, The construction of parallel tests from 
IRT^based item banks 

R.J.H. Engalan & R.J. Jannarona, A connection between 
item/subtest regression end the Rasch model 

H.J. Vo8, applications of decision theory to computer based 
adaptive instructional systems 

H. Kaldannan, Xogiinaar muitieUine/isionai IRT models for 
poiytomousiy scored items 

H. Keldannan, An IRT model for item responses that are 
subject to omission and/or intrusion errors 
H.J. V05 , Simul taneous opt imiza ti on of deci si ons using a 
linear utility /unction 

J.J. Adaxna, The construction of two-stage tests 
J. Kogut, A^ynptotic distribution of an IRT person fit index 
E . van dar Burg & 6 . Di jksterhuis, Nonlinear canonical 
correlation analysis of multiway data 

D.L. Knol & M.P.F. Bergar, Empirical comparison between 
factor analysis end item response models 

H. Keldannan & G. Macraady, Loglinear-latent-class models for 
detecting item bias 




RR-88-9 IT. J. van dmr Lind«n 4 T.J.H.M. Egg«n, Thm Raach model as a 
model for paired coapariaona with an individual tie parameter 

RR-SS-S R.J.H. Enqelen, w.J. van <ter Llncton, & S.J. Oostarloo, item 
information in the Reach model 

RR-88*7 J.H.A.K. Rlkara, Towards an authoring ayatem for item 
construction 

RR-88-6 H.J. Vosr The use of decision theory in the Minnesota 

Adaptive Instructional System 
HR-88*5 ir.j. van dar Lindan, Optimixing incomplete sample designs for 

item response model parameters 
RR-88-4 J.J. Adama, A note on solving large-scale zero-one 

programming problems 
RR-88-3 E. Boakkooi-Tinmingar A cluster-based method for test 

construction 

RR-88*2 w.J. van dar Lindan & J.J. Adama, Algorithmic test design 
s. using classical item parameters 

RR-88-1 E. van dar Burg & J. da Laauw, Nonlinear redundancy analysis 

Raaaarch Raporta can ba obtainad at coata from Bibliothaak, 
Dapartnant of Education, Univaraity of Twanta, P.O. Box 217, 
7500 AE Bnachada, The Natharlanda. 



34 



